Skip to content

fix(recall): allow exact filtering of untagged observations (#2295)#2322

Closed
wangzupeng12061 wants to merge 1 commit into
vectorize-io:mainfrom
wangzupeng12061:fix/recall-untagged-observations
Closed

fix(recall): allow exact filtering of untagged observations (#2295)#2322
wangzupeng12061 wants to merge 1 commit into
vectorize-io:mainfrom
wangzupeng12061:fix/recall-untagged-observations

Conversation

@wangzupeng12061

Copy link
Copy Markdown
Contributor

Summary

Fixes #2295.

Allow recall to select only global/untagged observations by using an empty tag
set with tags_match="exact".

This enables users to switch between shared observations and tagged observation
scopes without adding a synthetic "global" tag or maintaining negative filters.

Root cause

The centralized tag filters treated None and [] as "no filter" before
checking the matching mode.

As a result:

  • SQL retrieval applied no filter and returned every scope.
  • Python post-processing returned every result.
  • An exact empty compound tag group excluded all untagged results.
  • Link-expansion skipped Python filtering when the tag list was empty.

This contradicted the existing scope semantics where the empty tag set
represents the global observation scope.

Fix

  • Interpret an empty tag set as the global/untagged scope in exact mode.
  • Match both historical NULL tags and current empty-array tags.
  • Keep parameter indexes unchanged because the empty-scope SQL clause requires
    no bind parameter.
  • Apply the same semantics to flat SQL filters, Python post-processing, and
    compound tag groups.
  • Run link-expansion post-filtering for empty exact scopes.
  • Preserve existing behavior for all other modes: empty tags still mean no
    filtering for any, all, any_strict, and all_strict.

Tests

Added regression coverage for:

  • flat SQL filtering with None and [];
  • bind parameter numbering after an empty exact scope;
  • Python-side filtering of NULL, empty, and tagged results;
  • compound exact-empty tag groups in SQL and Python;
  • recall API filtering that returns an untagged memory while excluding a tagged
    memory.

Verification

Run on Linux x86_64 with Python 3.11, Node.js 22, and PostgreSQL 17 + pgvector:

  • 91 passed - tests/test_tags_visibility.py
  • 10 passed - tests/test_graph_filtering.py
  • ty check hindsight_api passed
  • Full ./scripts/hooks/lint.sh passed

@nicoloboschi

Copy link
Copy Markdown
Collaborator

Thanks for this @wangzupeng12061 — the approach here is solid. I've opened #2364 which implements the same exact-empty-scope fix and additionally updates the RecallRequest API descriptions and regenerates the client SDKs (Go/Python/TypeScript), OpenAPI spec, and docs so the user-facing surface is complete. It also keeps your parameter-offset regression test.

Proposing #2364 to supersede this PR. Closing in favor of it (credit to your investigation of the root cause).

@nicoloboschi

Copy link
Copy Markdown
Collaborator

Closing in favor of #2364, which supersedes this with the same fix plus the regenerated client SDKs, OpenAPI spec, and docs. Thanks again @wangzupeng12061 for the root-cause work — credited in the superseding PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow filtering to untagged observations

2 participants